Class-specific Word Embedding through Linear Compositionality
نویسندگان
چکیده
English linguist John Rupert Firth has a famous saying “you shall know a word by the company it keeps.” Most word representation learning models are based on this assumption that a word’s semantic meaning can be learned from the context in which it resides. The context is defined as a small unordered number of words surrounding the target word. Research has shown that context alone provides limited information because the context contains only neighboring words. Thus only local information is learned in the word embeddings. Some research tries to improve this by utilizing outside information sources such as a knowledge base. We observe that the meaning of a word in a sentence can be better interpreted when the class information or label of the sentence is presented. We propose three approaches to train class-specific embeddings to encode class information by utilizing the linear compositionality property of word embeddings. We present a general framework consisting of a pair of convolutional neural networks for text classification tasks where the learned class-specific embeddings serve as features. We evaluate our approach and framework on topic classification of a disaster-focused Twitter dataset and a benchmark Twitter sentiment classification dataset from SemEval 2013. Our results show a potential relative accuracy improvement of more than 5% over a recent baseline.
منابع مشابه
Non-Linear Similarity Learning for Compositionality
Many NLP applications rely on the existence of similarity measures over text data. Although word vector space models provide good similarity measures between words, phrasal and sentential similarities derived from composition of individual words remain as a difficult problem. In this paper, we propose a new method of of non-linear similarity learning for semantic compositionality. In this metho...
متن کاملGeometry of Compositionality
This paper proposes a simple test for compositionality (i.e., literal usage) of a word or phrase in a context-specific way. The test is computationally simple, relying on no external resources and only uses a set of trained word vectors. Experiments show that the proposed method is competitive with state of the art and displays high accuracy in context-specific compositionality detection of a v...
متن کاملA Word Embedding Approach to Predicting the Compositionality of Multiword Expressions
This paper presents the first attempt to use word embeddings to predict the compositionality of multiword expressions. We consider both singleand multi-prototype word embeddings. Experimental results show that, in combination with a back-off method based on string similarity, word embeddings outperform a method using count-based distributional similarity. Our best results are competitive with, ...
متن کاملConnected Component Based Word Spotting on Persian Handwritten image documents
Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...
متن کاملSkip-Gram - Zipf + Uniform = Vector Additivity
In recent years word-embedding models have gained great popularity due to their remarkable performance on several tasks, including word analogy questions and caption generation. An unexpected “sideeffect” of such models is that their vectors often exhibit compositionality, i.e., adding two word-vectors results in a vector that is only a small angle away from the vector of a word representing th...
متن کامل